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Abstract 

We consider massively parallel discrete event simulations where the communication 
topology among the processing elements is a complex graph. In the case of regular 
topologies we review recent results on virtual time horizon management. First we 
analyze the computational scalability of the conservative massively parallel update 
scheme for discrete event simulations by using the analogy with a well-known surface 
growth model, then we show that a simple modification of the regular PE commu- 
nication topology to a small-world topology will also ensure measurement scalability. 
This leads to a fully scalable parallel simulation for systems with asynchronous dy- 
namics and short-range interactions. Finally, we present numerical results for the 
evolution of the virtual time horizon on scale-free Barabasi-Albert networks serving 
as communication topology among the processing elements. 
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1 Introduction to the scalability problem of massively 
parallel discrete-event simulations 



The description and understanding of complex systems dynamics is in most cases impossible 
via analytic methods. The density of problems that are rigorously solvable with analytic 
tools is vanishingly small in the set of all problems. The only way one can obtain system level 
understanding of such problems is through direct simulation. The class of complex systems 
considered here are made of a large number of interacting individual elements with a finite 
number of attributes, or local state variables, each assuming a countable number (typically 
finite) of values. The dynamics of the local state variables are discrete events occurring 
in continuous time. Between two consecutive updates, the local variables stay unchanged. 
Another important assumption we make is that the interactions in the underlying system to 
be simulated have finite range. Examples of such systems include: magnetic systems (spin 
states and spin flip dynamics), surface growth via molecular beam epitaxy (height of the 
surface at a given point, measured from a growth level, molecular deposition and diffusion 
dynamics); epidemiology (health of an individual, health state change due to infection, or 
recovery); financial markets (wealth state, buy/sell dynamics), wireless communications, or 
queueing systems (number of jobs, job arrival dynamics). 

Often, as the case studied here, the dynamics of such systems is inherently stochastic and 
asynchronous. The simulation of such systems is rather non-trivial and in most cases the 
complexity of the problem requires simulations on distributed architectures, defining the field 
of Parallel Discrete- Event Simulations (PDES) [HIS El- Conceptually, the computational 
task is divided among N processing elements (PE-s), where each processor evolves the 
dynamics of the allocated piece. Due to the interactions among the individual elements of 
the simulated system (spins, atoms, packets, calls, etc.) the PE-s must coordinate with a 
subset of other PE-s during the simulation. For example, the state of a spin can only be 
updated if the state of the neighbors is known. However, some neighbors might belong to 
the computational domain of another PE, thus message passing will be required in order 
to preserve causality. In the PDES schemes we analyze, update attempts are self-initiated 
|1] and are independent of the configuration of the underlying system E]- Although 
these properties simplify the analysis of the corresponding PDES schemes, they can be 
highly efficient [7j and are readily applicable to a large number problems in science and 
engineering. Further, the performance and the scalability of these PDES schemes become 
independent of the specific underlying system i.e., we learn the generic behavior of these 
complex computational schemes. 

The update dynamics, together with the information sharing among PE-s, make the 
parallel discrete-event simulation process a complex dynamical system itself. In fact, it 
perfectly fits the type of complex systems we are considering here: the individual elements 
are the PE-s, and their states (local simulated time) evolve according to update events which 
are dependent on the states of the neighboring PE-s. 

With the number and size of parallel computers on the rise, the problem of designing 
efficient parallel algorithms, or update schemes becomes increasingly important. In passing, 
we can mention a few examples of large parallel computers: the 9472-node ASCII Red 
at Sandia, the 12288-node QCDSP Terafiop Machine at Brookhaven, the 8192-node IBM 
ASCII White with 12.3 Terafiops. IBM is currently building the Blue Gene/L with 200 
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Teraflops, with over 10^ nodes. As a matter of fact the largest supercomputer ever built is 
by Nature itself: the brain, which does an immense parallel computing task to sustain the 
individual. In particular the human brain has 10^^ PE-s (neurons) each with an average 
of 10^ synaptic connections, creating a bundle on the order of 10^^ "wires" jammed into a 
volume of approximately 1400 cm'^. 

The design of efficient parallel update schemes is a rather challenging problem, due to 
the fact that the dynamics of the simulation scheme itself is a complex system, whose prop- 
erties are hard to deduce using classical methods of algorithm analysis. Here we present a 
less conventional approach to the analysis of efficiency and scalability for the class of mas- 
sively parallel conservative PDE-s schemes, by exactly mapping the parallel computational 
process itself onto a non-equilibrium surface growth model |H]. This allows us to translate 
the questions about efficiency and scalability into questions formulated in terms of certain 
topological properties of this non-equilibrium surface. Then, using methods from statistical 
mechanics, developed some time ago to study the dynamics of such surfaces (in a com- 
pletely different context), we solve the scalability problem of the computational PDES-s 
scheme |H1 Ej- Similar connections between computational schemes and complex systems 
behavior have recently been made jTUl ^] for rollback-based PDES-s algorithms [12] and 
self-organized criticality [T^ . 

Since one is interested in the dynamics of the underlying complex system, the PDES 
scheme must simulate the 'physical time' variable of the complex system. When the sim- 
ulations are done on a single processor machine, a single (global) time stream is sufficient 
to "label" or time-stamp the updates of the local configurations, regardless whether the 
dynamics of the underlying system is synchronous or asynchronous. When simulating asyn- 
chronous dynamics on distributed architectures, however, each PE generates its own phys- 
ical, or virtual time, which is the physical time variable of the particular computational 
domain handled by that PE. Due to the varying complexity of the computation at different 
PE-s, at a given wall-clock instant the simulated virtual times of the PE-s can differ, a 
phenomenon called "time horizon roughening" . We denote the simulated, or virtual time at 
PE i measured at wall-clock time t, by Ti{t). For non-interacting subsystems the wall-clock 
time t is directly proportional to the (discrete) number of parallel steps simultaneously per- 
formed on each PE, also called the number of Monte-Carlo steps (MCS) in dynamic Monte 
Carlo simulations. Without altering the meaning, t from now on will be used to denote 
the number of discrete steps performed in the parallel simulation. The set of virtual times 
{'Tiif)}f=i forms the virtual time horizon of the PDES-s scheme after t parallel updates. 

In conservative PDES-s schemes fl], a PE will only perform its next update if it can 
obtain the correct information to evolve the local configuration (local state) of the underlying 
physical system it simulates, without violating causality. Otherwise, it idles. Specifically, 
when the underlying system has nearest-neighbor interactions, each PE must check with 
its "neighboring" PEs (mimicking the interaction topology of the underlying system) to 
see if those are progressed at least up to the point in virtual time where the PE itself did 
jni El • Based on the fundamental notion of discrete-event systems that the state of a local 
state variable remains unchanged between two successive update attempts, the above rule 
guarantees the causality of the simulated dynamics IH] . A simple example illustrating this 
is shown in Figure ^ One can consider, for example, a magnetic system as the underlying 
physical system, where the spins are arranged in the sites of a one-dimensional lattice, and 
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Figure 1: A simple diagram to illustrate the conservative PDES-s scheme for a one- 
dimensional system with nearest-neighbor interactions. 



that a single spin is handled by a single PE (for more realistic and efficient implementations 
see 13 ini El)- In Fig- Q which shows the distribution of the virtual simulated times at a 
given wall-clock instant t, the only PE that can update from the set {i — + 1} is in 
site i, since the state of the neighboring spins at sites i ± 1 are already known. However, 
PE-s i ±1 cannot update their spin states at wall clock instant t, because, the state of the 
neighboring spin i at their simulated times (at rj_i and Tj+i) is not known yet. In other 
words PE i can only update at wall-clock instant t if rj(t) < min{rj_i(t), rj_i(t)}, i.e., its 
virtual time is a local minimum among the virtual times of its neighboring PE-s. It is easy 
to see that the same conclusion holds for arbitrary PE topologies. Let the topology for the 
communication among the processing elements be symbolized by a graph G{V, E), where V 
is the vertex set of nodes and E is the edge set of G. Given a node i G V{G), we denote 
by S^^^ the set of first order neighbors on G of i. Then, node (PE) i can update its state, 
in the conservative PDES-s scheme iff: 

TP{t)< min{rf\t)} ,. = l,..,Ar. (1) 

In the following, the set of (active) nodes which obey condition at time t, will be denoted 
by A{t). Now we are in the position to formulate the scalability problem of PDES-schemes 
for systems with asynchronous dynamics. For the PDES-s scheme to be fully scalable, the 
following two criteria must be met: (i) the virtual time horizon must progress on average at 
a nonzero rate, and (ii) the typical spread of the time horizon should be finite, as the number 
of PE-s goes to infinity. When the first criterion is ensured for large enough times t, the 
simulation is said to be computationally scalable, and it just means that when increasing the 
size of the computation to infinity, while keeping the average computational domain/load 
on a single PE the same, the simulation will progress at a nonzero rate. However, as we will 
show below, increasing the system size, the spread in the time horizon can diverge, severely 
hindering frequent data collection about the state of the simulated system. Specifically, 
when one requires to take a measurement of some physical property of the simulated system 
at (virtual, or simulated) physical time r, we have to wait (in wall-clock time) until all the 



4 



virtual simulated times at all the PE-s pass through the value of r. Thus, in order to collect 
system-wide measurements from the simulation, we incur a waiting time proportional to the 
spread, or width of the fluctuating time horizon. For PDES-s schemes for which the spread 
diverges with system size, the waiting time for the measurements will also diverge, and the 
scheme is not measurement scalable. When condition (ii) is fulfilled for large enough times 
t, we say that the PDES-s scheme is measurement scalable. 

The scalability criteria above can simply be formalized in terms of the properties of the 
virtual time horizon, {Ti'^\t)}fLi. The average of the time horizon after t parallel steps is: 



At a given (wall-clock) time t the only PE-s that can make progress, i.e., are not in idle, are 
those which have simulated virtual times obeying condition (PJ. Thus, the rate of progress 
of the time horizon average becomes: 



(t + l)-r(^)(t) = l 5: [rf)(t + l)-rf)(t)] . (3) 



The difference in the square brackets on the r.h.s of Q is the physical time elapsed between 
two consecutive events in the physical domain simulated by the Ith PE, and it is determined 
by the physical process responsible for the stochastic dynamics of the simulated complex 
system. If we replace the time intervals in square brackets in (jS)) with their (obviously finite) 
average value A, we obtain that the average progress rate of the time horizon, or average 
utilization {u^^\t)) = {T^^\t + 1) — T^^\t)) is proportional to the number oi non-idling, or 
active PE-s. The average (■) is taken over the stochastic event dynamics, which is assumed 
to be the same at all sites. For many cases, the A factor is independent on (due to the 
finite range of the interaction in the complex system), so the computational efficiency, or 
average utilization of the simulation can simply be identified with the average density of the 
active PE-s: 

. M ,4, 

where |^^'^''(t)| denotes the number of elements of the set A^'^\t). Thus, the PDES-s scheme 
is computationally scalable, if there exists a constant c > 0, such that: 

Af— too 

The measurement scalability of the PDES-s scheme, is characterized by the spread of the 
virtual time horizon. Instead of dealing with the actual spread (difference between the 
maximum and minimum values) we shall consider the average "width" of the time horizon 
defined as: 

N 



([«^^^^]'W) = ^E[-f W--^^Ht)]' . (6) 
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A PDES-s scheme is measurement scalable, if there exists a constant M > 0, such: 



([«;(^)]^(oo)) = hm j^J2[^f\t)-r^''\t)]\ M . 



(7) 



In reality, the number of PEs or the simulation time t can never be taken to infinity, so 
for practical purposes, the scalability is deduced from the scaling behavior of the quantities 
for long times and for large number of PEs. The setup presented above is perfectly suitable 
to establish a mapping between non-equilibrium surface growth models ^H] and conservative 
PDES-s schemes. We discuss extensively this mapping in the next section. 

The paper is organized as follows: in Section |21 we discuss the scalability of the computa- 
tional phase and the failure of scalability of the measurement phase of the basic conservative 
scheme on regular topologies. Then we show how a simple modification of the communi- 
cation topology (from a regular lattice to a small-world structure) leads to a fully scalable 
PDES-s scheme. In section El we study the scalability problem on scale-free network topolo- 
gies. Section mis devoted to conclusions. 

2 Scalability of conservative PDES-s schemes on reg- 
ular and small-world topologies 

In many large complex systems the stochastic event dynamics can be characterized by a 
Poisson distributed stream. For example, in an Ising magnet with single spin-flip Glauber 
dynamics [7] the spin-flip attempts are Poisson distributed events, or in wireless cellular 
communications the call arrivals also obey Poisson statistics etc. In the following 
we restrict ourselves to such Poisson distributed stochastic processes for event dynamics, 
however numerical simulations show, that our conclusions for scalability hold for a large 
class of other stochastic distributions, as well. The evolution of the virtual time horizon 
incorporating condition (0) for Poisson asynchrony is given by the equation: 



Here 6{x) is the Heaviside step function, and r]i{t) the Poisson distributed virtual time 
increment at PE i, and time t is. These increments are drawn at random, independently of 
i and t, and of the existing time horizon. 

2.1 The basic conservative scheme on regular topologies 

Next, we consider the basic conservative scheme, which is defined on regular, square lattice 
communication topologies, in d dimensions, so that = L*^. For brevity, we drop from the 
superscript (G) in the notation for Ti(t). In particular, we first illustrate our analysis on the 
simplest regular topology, that of a regular one- dimensional lattice, with periodic boundary 
conditions {G is a ring). Later we discuss the general, c?-dimensional case. The evolution 
equation on the ring becomes simply: 





Tiit+l) 



ri{t)+r]i{t)9{T,^i{t) 



T,{t))9{Ti+i{t)-Ti{t)). 



(9) 
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with the boundary conditions t^v+i = t^v = tq. The total number of active sites/PE-s is 
thus given by \ A(t) \ = Xlili ^(''l-i(^) ~ 'Ti{t))0{'Ti+i(t) — Tiit)) so the average utihzation (jlj) 
becomes: 

{u{L,t)) = -J2{ein_,(t)-nitmn^^it)-nm (lO) 

i=l 

The average (■) is performed over the random variables {?7j(t')} i=i,..,L which have an expo- 

t'=i,..,t 

nential distribution, Prob{x < rj < x + 6x} = ^ dye'"^. In spite of the simple appearance 
of the dynamics 0, and the exponential (or Poisson) stochastic dynamics at nodes, calcu- 
lating the average utilization ()10|) is very difficult. Even a rigorous proof for the existence 
of the lower bound (0) using direct methods is still an open problem. 

Here we present a different approach, by mapping first the problem to a non-equilibrium 
surface grown via molecular beam epitaxy (where atoms or molecules are deposited from 
vapors, or beams onto the surface) model. The quantities brought in analogy are: the i-th 
PE is the site i in the substrate; the number of parallel updates t is the number of deposited 
monolayers; rj(t) is the height hi{t) at site i and time t; a virtual time increment of r]i{t) 
at PE i in the t-th step, corresponds to a material "rod" of length rii{t) deposited onto the 
surface, see Figure (j21). The length of the rod is a Poisson distributed random variable. 
During the t-th update, the rods are deposited only in local minima of the surface. The 
utilization of the PDES-s scheme corresponds to the the density of local minima of the 
growing surface. Even though the length of the rods are random independent variables, the 
fact that they can only be deposited in local minima will generate lateral correlations into 
the surface fiuctuations, and makes the problem hard to compute exactly. The rods are 



V 

V 



T T i 

:tive Active 

Figure 2: A simple surface growth model on a 1-d substrate corresponding to the basic 
conservative PDES-s scheme. 

deposited onto the surface in a parallel update scheme: after all local minima are updated 
(deposited onto) the the time t is incremented by unity. We will coin the surface growth 
analog of our basic conservative PDES-s scheme , the 'MPEU' model (Massively Parallel 
Exponential Update Model). 
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Both the utihzation (density of minima) and the width of the time horizon are quantities 
characterizing the fluctuations of the growing surface. The type of fluctuations can be 
classified into universality classes, each class having distinct statistical properties. Studying 
the PDES-s scheme as a surface growth model, we can describe its fluctuations and identify 
the surface growth universality class it belongs to. In order to do that we first introduce 
the slope variables, (pi = Ti — rj_i. Provided Ti(t) is a local minimum, a rod deposited 
of length rji corresponds to taking an amount of rji from and adding it to (pi, since 
(pi(t + 1) = Ti(t) - Ti_i(t) + r]i(t) and (pi+i{t + 1) = - Ti(t) - r]i{t), i.e., in the surface 

of slopes, {(pi}f=i the dynamics is biased surface diffusion, given by the equation: 

(Pi{t + 1) - (Pi{t) = r],{t)e{-(p,{t))e{(Pi^i{t)) - 7],^i{t)e{~(Pi^,{t))e{(p,{t)) (ii) 

with the constraint Yli=i 0i = generated by the periodic boundary conditions in the r 
variables. In terms of the local slope variables, the expression for the average density of 
minima, or average utilization becomes: {u{L,t)) = j'^f^i {9{—(pi{t))6{(pij^i{t))). Transla- 
tional invariance implies (no node is statistically special) {u{L,t)) = {9{—(pi(t))9{(pi+i(t))) 
for any i = 1,2,..,L. From it follows that (rj(t + 1)) — {Ti{t)) = {u{L,t)), thus, the 
average rate of propagation of the MPEU surface is identical to the average utilization of 
the PDES-s scheme. It is also easy to see that in the slope language it is identical to the 
average current in the ring. Next we perform a naive coarse graining by using the represen- 
tation 6'(0) = lim^^o I [1 + tanh(0/fi;)], and keeping only the terms up to the order {(p/ n). 
Performing this coarse graining, one obtains: 

(0,,(t + 1)) - (0,(t)) = ^{(P^+l{t) - 2(P,{t) + 0,_i(t)) - -^(0.(0m - 0.-i)) . (12) 

Strictly speaking all of the (0//t)", n = 1,2, ... terms are divergent, but taking the proper 
continuum limit and introducing an appropriately scaled bias, the only relevant terms can 
be shown to be those appearing in Eq. ()12|) jTB| . In the continuum limit, one thus, obtains 
for the coarse-grained field: 

d , , , d 



dt ^ dx"^ ^ dx 

where A is a parameter related to the coarse-graining procedure. The above nonlinear 
partial differential equation (fTSjl is known as the nonlinear biased diffusion, or the Burgers 
equation [T7j. Returning to the coarse-grained equivalent of the height, or virtual times, f, 
via (p = df/dx, we obtain the Kardar-Parisi-Zhang (KPZ) equation (iSj: 

df _ d'^f / df\^ 

dt~d^~ \d^) ' ^ ^ 

To capture the fluctuations, one typically adds a delta-correlated noise term ^{x,t), to the 
right hand side, conserved for Eq. ()13p (i.e., j C,dx = 0), and non-conserved for Eq. (fT^ . 
It is important to note that we obtained the KPZ equation as a result of a coarse-graining 
procedure. While this results in the "loss" of some of the microscopic details for the original 
growth model on the lattice, Eq. (fT^ with noise added captures the long-wavelength behav- 
ior of the MPEU model. Thus, we claim that virtual time horizon for the basic conservative 
PDES scheme exhibit kinetic roughening and it belongs to the KPZ universality class. Iden- 
tifying the universality class of a model is one of the main objectives and used extensively in 
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surface science to classify fluctuation statistics. Our procedure above indicates that the long- 
wavelength statistics of the fluctuations of the time horizon for the basic conservative PDES- 
s scheme is in fact captured by the nonlinear KPZ equation. In one dimension a steady-state 
for the surface fluctuations is reached (in the long time limit) for any finite system-size and 
it is governed by the Edwards- Wilkinson (EW) Hamiltonian Hew oc J dx (|^) (see, e.g., 
|19j). The corresponding surface is a simple random- walk surface, where the slopes are 
independent random variables in the steady state. This means that out of the four local 
configurations of slopes around a point (down-up, down-down, up-up, up-down) only one 
contributes in average to a minimum (down- up), and since they are all equally likely, we 
conclude that {uew{L oo,t — oo)) = 1/4 = 0.25. Our numerical simulations for the 
MPEU model, see Fig^) indicate a value of {u{L ^ cx),t ^ oo)) = 0.24641 ± (7 x 10"^) 
a value close, but definitively not the same as for the simple random walk surface. The 
reason for the obvious difference is that the coarse-grained version and the original micro- 
scopic model are not identical over the whole spectrum of wavelengths of the fluctuations. 
The coarse-graining procedure preserves the statistics of the long-wavelength modes, but it 
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Figure 3: a) Steady state average utilization as a function of the number of PE-s L in a one- 
dimensional ring geometry; b) shows that the full distribution for the rescaled utilization in 
the steady state u = {u{L) — {u{L)))/aL can be collapsed onto the normal distribution, c) 
Slope-slope correlation function. 



looses some information on the short-wavelength ones. In particular the density of minima 
is heavily influenced by the short wavelengths (by how "fuzzy" the interface is). However, 
the density of minima cannot vanish in the thermodynamic limit: a zero density of local 
minima would imply that it is zero on all length-scales which would contradict the fact that 
it belongs to the EW universality class. The fact that the steady-state of the MPEU model 
belongs to the EW universality class guarantees that the local slopes are short-range corre- 
lated [Fig.Sc)], and that the finite-size corrections for the density of local minima (average 
propagation rate of the surface) follows a universal scaling form j20j : 

(u(L,oo)) ~ (n(oo,oo)) + . (15) 

Here a is the roughness exponent (equals to 1/2 for the EW universality class) , characteriz- 
ing the macroscopic surface-height fluctuations, as described in detail in the next paragraph. 
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Figure 121 confirms this scaling behavior. Further, calculating the spread in the average uti- 
lization the steady state as function of system size, cr£ = {u'^{L, oo)) — {u{L, oo))"^, we obtain 
oc L~^/^. These findings suggest that the utilization is a self-averaging macroscopic quan- 
tity: its full distribution -Pl(m) for large L is a Gaussian [Fig. Eb)]- 




t x=w /<w > 



Figure 4: a) The width of the time horizon fluctuations show dynamical scaling and indicate 
KPZ universality, b) The scaling function for the steady-state width distribution follows 
the scaling function for the EW (one-dimensional KPZ) universality class. 

In the following we show numerical results supporting our claim that the MPEU model 
belongs to the KPZ universality class. One of the fundamental characteristic quantities 
strongly influenced by the long-wavelength modes is the average width of the height fluctu- 
ations, as given in Eq. ®. As the surface grows due to deposition, after an initial transient 
the growth of the width will grow as a power law {w'^{L,t)) ~ along with the lateral 
surface correlations ^|[(L,t) ~ t^^^, until the correlations reach the system size (^|| = L) at 
a crossover time (see, e.g., Ref. ^H])- After the crossover time (for any finite system 
L) the surface fluctuations are governed by a steady-state distribution and the width scales 
as 

(w;'(L,oo)) -L^" . (16) 

The exponent (3 is called the growth exponent, a is the called roughness exponent and z is 
called the dynamic exponent in the surface growth literature |^ . It is easy to show that the 
three exponents are not all independent, and a = z(3 holds jilSJ. Also, these scaling forms 
allow one to collapse all the different curves for the width onto a single function in the scaling 
regime, expressing the dynamic scaling property of the width: {w'^{L,t)) = L'^'^f(t/L^) (/ 
is easy to read of after comparing it to the scaling behavior). For the KPZ interface, the 
analytically obtained exact values for the exponents are: f3 = 1/3, a = 1/2 and z = 3/2. 
Fig. 0] shows the numerically measured scaling properties for the width of the MPEU model, 
the numerically obtained values for the exponents /? = 0.326 ± 0.005, a = 0.49 ± 0.01 for 
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large system sizes {L = 10^) confirm the KPZ behavior, including the dynamical scaling 
property (inset). Another confirmation for the EW universality class in the steady state, 
comes from measuring the full width distribution P{w'^). For systems belonging to the EW 
universality class and having the same type of boundary conditions imposed, the width 
distribution has a universal scaling formpT] P[w'^) = -j^^ (i^j^) ^^^^ 

$(x) = ^ V(-l)""'nV#"'" , (17) 

^ n=l 

where the above scaled distribution is for the case of periodic boundary conditions. Figure 
I3]d) is a confirmation that MPEU indeed belongs to the steady state of the EW class, imply- 
ing that the average utilization (density of local minima) approaches a non-zero, finite value 
in the thermodynamic limit (0) as reflected by Eq. ()15p. Therefore, the basic conservative 
scheme is computationally scalable. [For an in-depth and systematic analytical calculation 
of the density of minima (utilization) for a number of surface growth models (including the 
EW class) see Ref. [221 ■] The measurement phase of the basic conservative scheme, how- 
ever, is not scalable, as indicated by the power-law divergence of the width in the long-time 
large L limit, Eq. (fTB|) . For higher-dimensional topologies, using universality arguments, the 
conclusion remains the same: the basic conservative PDES-s is computationally scalable, 
but the measurement phase may not be, depending on the upper critical dimension ^H] of 
the surface (see Ref. [23 121) ■ 

2.2 The conservative scheme on small world networks 

From the previous section it follows that the average width of the fluctuations scales in the 
steady state as {w'^{L,t=oo)) ~ L^" = L , i.e., it grows linearly with the system size. This 
means that the basic conservative PDES-s scheme is not measurement scalable. Standard 
methods to control the width of the virtual time horizon in a PDES scheme utilize some 
kind of a windowing technique [TJ. That is, the height of the local simulated time at any 
PE cannot progress beyond an appropriately chosen and regularly updated "cap" , measured 
from the global minimum of the time horizon [23]. Thus, a PDES scheme with a moving 
window relies on frequent global synchronizations or communications, which, depending on 
the architecture, can get costly for large number of PEs. Here we show how to modify 
the original conservative scheme such that the scheme is also measurement scalable without 
global "intervention" p. 

The divergence of the width of the surface fluctuations is closely related to the fact 
that the lateral surface correlations also grow with the system size. In particular, for the 
one-dimensional EW surface in the steady state, for large L (and fixed /) 

{fifi+i) oc (L, oo) - |/| , (18) 

where fj are the coarse-grained height fluctuations measured from the mean and ^||(-^^, oo) ~ 
L. Thus, (w^(L, oo)) = (ff) oc ^||(L, oo) ~ L. The "height-height" correlations can be 
characterized by introducing the structure factor for the heights: 

S^^\k) = Uhf-k) (19) 
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where k = 27m/ L, n = 0,1, 2, .., L — 1 is the wave-vector, and r^, = ^j=o ^ — t) is the 

discrete spatial Fourier transform of the fluctuations of virtual time horizon. Then 

m^i) = jJ2e^'^S^^\k) (20) 



and 



(21) 



Since the universality class for the time horizon evolution is EW, it follows that the expected 
behavior for the steady-state structure factor for small wave-numbers is 



5W(/c) oc 1 



(22) 



[see, e.g., Eq. (11) in Ref. [22j). Indeed, this is also confirmed by our direct simulation 
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Figure 5: Steady-state structure factors for the virtual time horizon for the a) basic conser- 
vative scheme on a regular one-dimensional lattice (p=0) and b) for the small-world scheme 
with p=0.1. 

results, shown Fig. |3^). This form of the structure factor implies that there are no length- 
scales other than the lattice constant and the systems size, and thus, the correlation length 
and the width diverge in the thermodynamic limit, as also can be seen by directly evaluating 
Eq. (ED). 

To de-correlate the surface fluctuations, we modify the communication topology in the 
following way 0: for every node i, at the onset of the simulation, we introduce one extra 
quenched random communication link r{i). Together with the existing regular topology, 
these extra communication links will form a small- world graph |2Hl 123 I2H] • Note that in 
our speciflc construction of the small-world network, each node has exactly one random 
connection and r{r{i))=i, so that there are exactly L/2 random links distributed. The 
updating on PE i will obey the following probabilistically chosen condition: 



Ti < 



min{rj_i, Tj+i, r^.(j)} with probability p 
min{rj_i, Tj+i} with probability 1—p 



(23) 



The PE actually performs the update (generate the virtual time of the next update, or deposit 
the rod at i in the MPEU surface) if condition (j^^ is fulfllled. This means that for sites that 
would normally be updated within the basic conservative scheme, i.e., < min{rj_i, Tj+i} 
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Figure 6: Steady-state virtual time horizon snapshots with L = 10, 000 after t = 10^ parallel 
algorithmic steps (Monte-Carlo sweeps) for a) the basic conservative scheme {p = 0) and 
b) the small- world scheme p = 0.1. Note that the vertical scales are the same in a) and b) 
(plotted in arbitrary simulated time units [stu]). 



the PE will make an extra check for the condition Tj < rj,(j) with probability p. The 
parameter p allows us to tune the scalability properties of the corresponding PDES scheme 
on the quenched small-world network continuously from the pure basic conservative scheme 
{p = 0) to the "fully" small-world conservative scheme {p = 1). These occasional extra 
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Figure 7: a) The average steady-state width and b) the utilization for various p values. In 
addition to ensemble averages over 10 realizations of the random links (solid symbols) a 
single realiztion is also shown (open symbols). The solid straight line has a slope of 1/2 and 
represents the asymptotic one-dimensional KPZ power-law divergence of the width for the 
basic conservative scheme (p = 0). 

checkings through the quenched random links are not necessary for the faithfulness of the 
simulation. It is merely used to synchronize the PEs in such a way that the fluctuations of 
the time horizon remain bounded in the infinite system-size limit. Most importantly, as the 
width is reduced from "infinity" (or some large number proportional to L for finite number 
of PEs) to a finite controlled value, the utilization still remains bounded away from zero. 
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To support this statement, we first use the same coarse-graining procedure used to derive 
the KPZ equations as the continuum counterpart of the MPEU model. For the small-world 
topology we obtain 

* -7(P)? H- A f^V + . (24) 



dt dx"^ \dx 

with 7(p) = for p = 0, and 7(p) > for < p < 1. This implies that the extra checking 
along the random links introduces a strong relaxation (first term on the rhs of ()24j)) for 
the long-wavelength modes of the surface fluctuations, resulting in a finite width. A more 
transparent picture is gained if we look at the steady-state structure factor (fTUI). Restricting 
our attention to the linear terms in Eq. (j24j) we obtain 

5W(A:)oc^-. (25) 
7 -|- 

In this approximation, the lateral correlation length ^|| scales as 1/^/7, and remains finite 
(and system-size independent) in the thermodynamic limit for allp > (i.e., for an arbitrary 
small probability to utilize the random links). Figure Eb) shows the structure factor for the 
small- world network with p = 0.1, confirming the prediction of Eq. (|25|) for small wave 
numbers. Consequently, the height-height correlations decay exponentially 

{nn+i) oc eii e-l'l/«ii , (26) 

and the width remains finite, {w'^{L, oo)) ^ ^\\, where ^|| is independent of the systems size 
for all p > 0. Further, for the structure factors of the local slopes (the Fourier transform of 
the slope-slope correlations) one obtains 

^W(fc) = j{4>k4>~k) = k^S^^\k) oc 4^ = 1 - . (27) 

L 7 -|- fc^ + k'^ 

Both terms above yield short-range correlations (delta function for the first term and expo- 
nential decay for the second one), thus, the slopes remain short-range correlated, resulting 
in a non-zero density of local minima. Figure IHl shows two snapshots of the virtual time 
horizons for the basic conservative scheme p = 0, and the small- world scheme with p = 0.1. 
Figure [7^) shows the scaling of the steady state width with the system size for various p 
values and Fig|7|D) shows the scaling of the average, steady state utilization with the system 
size for the same set of p values. Notice that when increasing p (from p=0 to p=0.01), 
the width instantaneously drops from a linear divergence to a saturated value, while at the 
same time, the utilization hardly changes. In fact, an infinitesimally small p will make the 
width bounded, but at an only infinitesimal expense to the utilization. For example, for a 
hypothetically infinite system, taking p=0.01, the width is reduced from infinity to about 
40, while the utilization from 0.2464 only to about 0.246; for p=0.1, the width is further 
reduced to about 5, while the utilization only to 0.242. By further increasing p, the width 
further reduces, and at p=l it is about 1.46, whereas the utilization decreases to 0.141, still 
clearly bounded away from zero in the thermodynamic limit. 
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3 Scalability of the Conservative PDES-s scheme on 
scale-free network topologies 

The Internet is a spontaneously grown collection of connected computers. The number of 
(only) webservers by February 2003 reached over 35 million The number of PC-s in use 
(Internet users) surpassed 660 million in 2002, and it is projected to surpass one billion by 
2007 jHOl • The idea for using it as a giant supercomputer is rather natural: many computers 
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Figure 8: Steady-state utilization for the scale- free BA model. 

are in an idle state, running at best some kind of screen-saver software, and the "waisted" 
computational time is simply immense. Projects such as SETI@home [3T] or the GRID 
consortium ^] are targeting to harness the power lost in screen-savers. 
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Figure 9: Behavior of the time horizon width for the scale- free BA network. The inset shows 
the scaling of the steady-state width as function of system size, N. Notice the log-lin scale 
on the inset. 

Most of the problems solved currently with distributed computation on the Internet are 
"embarrassingly parallel" , i-e., the computed tasks have little or no connection to each 
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other similar to starting the same run with a number of different random seeds, and at the 
end collecting the data to perform statistical averages. However, before more large-scale, 
complex problems can be solved in real time on the Internet a number of challenges have 
to be solved, such as the task allocation problem which is rather complex by itself [TTj . 

Here we ask the following question: Given that task allocation is resolved and the PE 
communication topology on the internet is a scale-free network, what are the scalability 
properties of a PDES-s scheme on such networks? Here we present numerical results, for 
the PDES-s update scheme, as measured on a model of scale-free networks, namely the 
Barabasi- Albert model [SHIHSI- This network is created through the stochastic process of 
preferential attachment: to the existing network at time t oi N nodes, attaches the + 1-th 
node with m links ("stubs") at time t + 1, such that each stub attaches to a node with 
probability proportional to the existing degree (at t) of the node. Here we will only present 
the m = 1 case, when the network is a scale-free tree. Once we reach a given number of nodes 
in the network, we stop the process and use the random network instance to run the MPEU 
model on top of it, using the evolution equation (jHl) for the time horizon. While in case of 
regular topologies, the degree of a node is constant, e.g. for (i-dimensional "square" lattices, 
P(^')(fc) = 2d5k,2d , for the BA network, it is a power law in the asymptotic (A^ — >• oo) 
limit : P^^{k) ~ 2m?k~^ . The condition (fTJ) for a site to be updated, i.e., that its virtual 
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Figure 10: Connectivity-utilization Uk and relative connectivity utilization as function of 
degree. Each data set is obtained after averaging over 200 independent runs. 

time is a local minimum, is a local property, and thus we expect that the utilization itself 
be correlated with local structural properties of the graph, such as the degree distribution. 
To get a more detailed picture, we define two more quantities: 
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1) connectivity-utilization: 



MN,t) = \^ (28) 



which is the fraction of active nodes of degree k, and 
2) the relative connectivity-utilization: 

n{N,t)=^-^ (29) 

which is the fraction of active nodes of degree k within the set of all nodes of degree k. 
From the abobe definitions we find the following ovious relations: Ylk''^k{N,t) = u{N,t) 
and ^,rk{N,t)Nk/N = Y.k^k{N,t)P^''{k) = EkMN^t) = n(iV,t) = (r,(iV, t))„,,^,,, 
at all times. Figure |H1 shows the steady state {t —>■ oo, in the MPEU model on a fixed 
BA network of nodes) values of the average utilization as function of the network size 
N. The inset in Fig. |H] is analogous to Fig. Efa) which showed the same quantity on a 
ring. Notice that strictly speaking, the PDES-s scheme is computationally non-scalable. 
An empirical fit suggest that u*{N) = {u{N,t=oo)) ~ [in (aA^*)] with a ~ 3.322 and 
b = 0.902, i.e., the computation is only logarithmically (or marginally) non-scalable. For a 
system of A^=10^ nodes we have found a steady state utilization (for the worst case scenario) 
of M*(10^) = 0.1328 (13.3% efficiency), while for a system of a million nodes, A^=10^, the 
utilization dropped only to m*(10^) = 0.073 (7.3% efficiency), by less than half of its value! 
For practical purposes the PDES-s scheme can be considered computationally scalable, and 
this type of non-scalability we will call logarithmic (or marginal) non-scalability. 

Figure IHl shows the scaling of the width of the fiuctuations for the time horizon as func- 
tion of time, and the scaling of its value in the steady-state as function of system size 
(inset). Notice, that while the steady state width diverges to infinity, it only does so log- 
arithmically, {w^{N,t=oo)) ~ [in (cA^'^)] with c ~ 1.25 and d =~ 0.401. Some specific 
values: (w^(10^, t=oo)) ~ 3.01, (w^(10^, t=oo)) ~ 4.78. This means that the measure- 
ment phase of the PDES-s scheme on a scale-free network is non-scalable either, however, 
it is so only logarithmically, and for practical purposes the scheme can be considered scal- 
able. Overall, the PDES-s update scheme has logarithmic (or marginal) non-scalability on 
scale-free networks. If one examines the connectivity-utilization and relative connectivity- 
utilization in the steady state, as shown in Fig. one finds that with good approximation 
ul{N) ~ k~^, and rl{N) = const. ~ u*{N) for k < k^ and rl{N) ~ k~'^ for k > k^, with 
kx ~ l/u*{N) = In (aA^^) ~ InA^, being a crossover degree. 



4 Conclusions 

We studied the fundamental scalability problem of conservative PDES schemes where events 
are self-initiated and have identical distribution on each PE. First, we considered the scala- 
bility of the basic conservative scheme for systems with short-range interactions on regular 
lattices. By exploiting a mapping [Sj between the progress of the simulation and kinetic 
roughening in non-equilibrium surfaces, we found that while the average progress rate of 
the PEs (m(oo, oo)) is a finite non-zero value, the spread of the progress of the PEs about 
the mean (w^(oo, oo)) diverges. The former property makes the measurement phase of the 
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algorithm non-scalable. In order to make the measurement part of the simulation scalable 
as well, we introduced ^ quenched random connections between PEs (exactly one for each) 
so that the resulting random links on the top of the regular short-range connections formed 
a small-world-like connection topology. Invoking the same conservative protocol used at 
an arbitrarily small rate through the random links was sufficient to achieve full scalability: 
the PEs progress at a non-zero, near uniform rate without requiring global synchronization 
[S]. The above construction of a fully scalable algorithm for simulating large systems with 
asynchronous dynamics and short-range interactions is an example for the enormous "com- 
putational power and synchronizability" [2^] that can be achieved by small-world couplings. 
The suppression of critical fluctuations of the virtual time horizon is also closely related to 
the emergence of mean-field-like phase transitions and phase ordering in non-frustrated in- 
teracting systems jSZl EHl (HHl HOI lU] ■ In particular, the fiuctuations exhibited by the virtual 
time horizon with small-world synchronization should exhibit very similar characteristics to 
the fiuctuations of the order parameter in the XY-model placed on a small-world network 

Second, we studied the scalability properties for a causally constrained PDES scheme 
hosted by a network of computers where the network is scale-free following a "preferential 
attachment" construction ,34,.35j- Here the PEs simply have to satisfy the general criterion 
Eq. in order to advance their local time. Despite some nodes in the network having 
abnormally large connectivity (as a result of the scale-free nature of the degree distribution), 
we found that the computational phase of the algorithm is only marginally non-scalable. 
The utilization exhibited slow logarithmic decay as a function of the number of PEs. At 
the same time, the width of the time horizon diverged logarithmically slowly, rendering 
the measurement phase of the simulations marginally non-scalable as well. An intriguing 
question to pursue is how the logarithmic divergence of the surface fiuctuations observed 
here can be related to the collective behavior (in particular, the finite-size effects of the 
magnetic susceptibility) of Ising ferromagnets on scale-free networks UHl IMl 1^ with 
the same degree distribution. 
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